03:00
Slides: bit.ly/2024-nicd-mlops
Access the virtual environment:
Master password: pomelo-jujube
Check in using the QR code!
https://mlops-intro.jumpingrivers.training/welcome/
Password: pomelo-jujube
Background in Astrophysics.
Data Scientist @ Jumping Rivers:
Python & R support for various clients.
Teach courses in Python, R, SQL, Machine Learning.
Hobbies include hiking and travelling.
↗ jumpingrivers.com 𝕏 @jumping_uk
Presentation slides (bit.ly/2024-nicd-mlops).
R coding demos in virtual environment.
Exercises will apply the demo code to a different dataset and model.
Not an R user?
Introduction to MLOps
Building a basic MLOps workflow
We will be deploying models locally only…
Want to contribute?
What is MLOps?
Discuss
Extra points if you contribute…
QR Code:
03:00
The typical data science workflow:
MLOps: Machine Learning Operations
Let’s set up a basic MLOps workflow!
Palmer Penguin dataset
Let’s predict species using flipper length, body mass and island!
Open demo.R in RStudio…
Attempt “Task 1: Data loading and tidying”
Need help? Check demo.R or raise your hand
Not an R user? The solution can be found in solutions.R
Finished? Scan the QR code for extra points!
05:00
Loading and importing
Tidying & cleaning
Versioning
Take advantage of native tools on ML platforms
Our model object can now be used to predict species:
v_model is a list with six elements
Attempt “Task 2: Modelling”
Need help? Check demo.R or raise your hand
Not an R user? The solution can be found in solutions.R
Finished? Scan the QR code for extra points!
05:00
Choosing the right model can be tough!
Versioning
Retrieve a model
Inspect the stored versions
We deploy models as APIs which take input data and send back model predictions.
We can use a {plumber} API to deploy a {vetiver} model.
Check the deployment with:
Checking that our API works!
Attempt “Task 3: Deploying your model”
Need help? Check demo.R or raise your hand
Not an R user? The solution can be found in solutions.R
Finished? Scan the QR code for extra points!
05:00
Vetiver is available for both Python and R!
In Python you would use Python ML libraries rather than {tidymodels}
Vetiver documentation: vetiver.posit.co
Try deploying locally to check that your model API works as expected.
Use environment managers like {renv} to store model dependencies.
Use containers like Docker to bundle model source code with dependencies.
Our Dockerfile contains a series of commands to:
Set the R version and install the system libraries.
Install the required R packages.
Run the API in the deployed environment.
Cloud MLOps normally doesn’t come free…
Some platforms offer free trials (e.g., SageMaker).
May be cheaper if you’re already invested in a particular cloud platform
Costs can rise depending on computational resources consumed.
Model building and deployment use different environments.
Deployment is just the beginning…
Why should a deployed model be closely monitored?
What warning signs would you look out for?
Discuss
Bonus points if you contribute…
QR Code:
03:00
Attempt “Task 4: Detecting model drift”
Need help? Check demo.R or raise your hand
Not an R user? The solution can be found in solutions.R
Finished? Scan the QR code for extra points!
05:00
As your data grows, run regular checks of model performance.
Monitor key model metrics over time.
You may notice a downward trend…
Retrain the model with the latest data and redeploy.
As data and user base grows, your model needs to scale.
Upgrade your computational resources.
Consider moving from a relational database to a data warehouse.
Check how many users your license (AWS, Posit, etc) permits.
Vetiver has built in functions to track scoring metrics over time.
Requires a time variable in the dataset.
Load the model from your {pins} board.
Make sure you are scoring the deployed version.
Specify the period for scoring (weeks, months, years, …).
Model metrics can also be stored with {pins}!
Consider our life expectancy data from the exercises…
Compute scoring metrics over specified period:
Requires a Date column (generate from Year).
recent_data: could be data from the past year or all historical data.
Pin the metrics
Retraining and redeployment can happen at the click of a button.
Encourages good practices like model versioning and packaging of source code.
Reduces human error.
Well defined and reproducible.
Consider whether it is worth the cost/effort before starting.
Remember to provide feedback!
Open this workshop in the conference app
Tap “More”
Tap “Feedback”
Fill in the form
Workshop QR code